{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# pickle, cPickle 模块：序列化 Python 对象"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`pickle` 模块实现了一种算法，可以将任意一个 `Python` 对象转化为一系列的字节，也可以将这些字节重构为一个有相同特征的新对象。\n",
    "\n",
    "由于字节可以被传输或者存储，因此 `pickle` 事实上实现了传递或者保存 `Python` 对象的功能。\n",
    "\n",
    "`cPickle` 使用 `C` 而不是 `Python` 实现了相同的算法，因此速度上要比 `pickle` 快一些。但是它不允许用户从 `pickle` 派生子类。如果子类对你的使用来说无关紧要，那么 `cPickle` 是个更好的选择。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "try:\n",
    "    import cPickle as pickle\n",
    "except:\n",
    "    import pickle"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 编码和解码"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "使用 `pickle.dumps()` 可以将一个对象转换为字符串（`dump string`）："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "DATA:\n",
      "[{'a': 'A', 'c': 3.0, 'b': 2}]\n",
      "PICKLE:\n",
      "(lp1\n",
      "(dp2\n",
      "S'a'\n",
      "S'A'\n",
      "sS'c'\n",
      "F3\n",
      "sS'b'\n",
      "I2\n",
      "sa.\n"
     ]
    }
   ],
   "source": [
    "data = [ { 'a':'A', 'b':2, 'c':3.0 } ]\n",
    "\n",
    "data_string = pickle.dumps(data)\n",
    "\n",
    "print \"DATA:\"\n",
    "print data\n",
    "print \"PICKLE:\"\n",
    "print data_string"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "虽然 `pickle` 编码的字符串并不一定可读，但是我们可以用 `pickle.loads()` 来从这个字符串中恢复原对象中的内容（`load string`）："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[{'a': 'A', 'c': 3.0, 'b': 2}]\n"
     ]
    }
   ],
   "source": [
    "data_from_string = pickle.loads(data_string)\n",
    "\n",
    "print data_from_string"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 编码协议"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`dumps` 可以接受一个可省略的 `protocol` 参数（默认为 0），目前有 3 种编码方式：\n",
    "\n",
    "- 0：原始的 `ASCII` 编码格式\n",
    "- 1：二进制编码格式\n",
    "- 2：更有效的二进制编码格式\n",
    "\n",
    "当前最高级的编码可以通过 `HIGHEST_PROTOCOL` 查看："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2\n"
     ]
    }
   ],
   "source": [
    "print pickle.HIGHEST_PROTOCOL"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "例如："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Pickle 1: ]q\u0001}q\u0002(U\u0001aU\u0001AU\u0001cG@\b\u0000\u0000\u0000\u0000\u0000\u0000U\u0001bK\u0002ua.\n",
      "Pickle 2: �\u0002]q\u0001}q\u0002(U\u0001aU\u0001AU\u0001cG@\b\u0000\u0000\u0000\u0000\u0000\u0000U\u0001bK\u0002ua.\n"
     ]
    }
   ],
   "source": [
    "data_string_1 = pickle.dumps(data, 1)\n",
    "\n",
    "print \"Pickle 1:\", data_string_1\n",
    "\n",
    "data_string_2 = pickle.dumps(data, 2)\n",
    "\n",
    "print \"Pickle 2:\", data_string_2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "如果 `protocol` 参数指定为负数，那么将调用当前的最高级的编码协议进行编码："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "�\u0002]q\u0001}q\u0002(U\u0001aU\u0001AU\u0001cG@\b\u0000\u0000\u0000\u0000\u0000\u0000U\u0001bK\u0002ua.\n"
     ]
    }
   ],
   "source": [
    "print pickle.dumps(data, -1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "从这些格式中恢复对象时，不需要指定所用的协议，`pickle.load()` 会自动识别："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Load 1: [{'a': 'A', 'c': 3.0, 'b': 2}]\n",
      "Load 2: [{'a': 'A', 'c': 3.0, 'b': 2}]\n"
     ]
    }
   ],
   "source": [
    "print \"Load 1:\", pickle.loads(data_string_1)\n",
    "print \"Load 2:\", pickle.loads(data_string_2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 存储和读取 pickle 文件"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "除了将对象转换为字符串这种方式，`pickle` 还支持将对象写入一个文件中，通常我们将这个文件命名为 `xxx.pkl`，以表示它是一个 `pickle` 文件： \n",
    "\n",
    "存储和读取的函数分别为：\n",
    "\n",
    "- `pickle.dump(obj, file, protocol=0)` 将对象序列化并存入 `file` 文件中\n",
    "- `pickle.load(file)` 从 `file` 文件中的内容恢复对象"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "将对象存入文件："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "with open(\"data.pkl\", \"wb\") as f:\n",
    "    pickle.dump(data, f)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "从文件中读取："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[{'a': 'A', 'c': 3.0, 'b': 2}]\n"
     ]
    }
   ],
   "source": [
    "with open(\"data.pkl\") as f:\n",
    "    data_from_file = pickle.load(f)\n",
    "    \n",
    "print data_from_file"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "清理生成的文件："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import os\n",
    "os.remove(\"data.pkl\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
